42 research outputs found

    Parallel and sequential approximation of shortest superstrings

    Full text link

    A framework for research on technology-enhanced special education

    Get PDF
    Based on results from the Technologies for Childrenwith Individual Needs Project and two case projects,we propose a new multidisciplinary framework forresearch between computer science, educationaltechnology, and special education. The frameworkpresents a way to conduct research that aims atdeveloping new methods for technology-enhancedspecial education and for developing adaptablesoftware and hardware tools for individual needs ineducational settings.Peer reviewe

    Fast Searching in Packed Strings

    Get PDF
    Given strings PP and QQ the (exact) string matching problem is to find all positions of substrings in QQ matching PP. The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in a single word, giving us the opportunity to read multiple characters simultaneously. In this paper we study the worst-case complexity of string matching on strings given in packed representation. Let mnm \leq n be the lengths PP and QQ, respectively, and let σ\sigma denote the size of the alphabet. On a standard unit-cost word-RAM with logarithmic word size we present an algorithm using time O\left(\frac{n}{\log_\sigma n} + m + \occ\right). Here \occ is the number of occurrences of PP in QQ. For m=o(n)m = o(n) this improves the O(n)O(n) bound of the Knuth-Morris-Pratt algorithm. Furthermore, if m=O(n/logσn)m = O(n/\log_\sigma n) our algorithm is optimal since any algorithm must spend at least \Omega(\frac{(n+m)\log \sigma}{\log n} + \occ) = \Omega(\frac{n}{\log_\sigma n} + \occ) time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth-Morris-Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.Comment: To appear in Journal of Discrete Algorithms. Special Issue on CPM 200

    Greedy Shortest Common Superstring Approximation in Compact Space

    Get PDF
    Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for solving the problem. We present the first time and space efficient implementation of the classic greedy heuristic which merges strings in decreasing order of overlap length. Our implementation works in O(n log σ) time and bits of space, where n is the total length of the input strings in characters, and σσ is the size of the alphabet. After index construction, a practical implementation of our algorithm uses roughly 5n log σ bits of space and reasonable time for a real dataset that consists of DNA fragments.Peer reviewe

    Comparing De Novo Genome Assembly: The Long and Short of It

    Get PDF
    Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers – both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies – are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing “next-generation” assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium

    Excel as an Algorithm Animation Environment

    No full text
    Understanding of fundamental algorithms and designing algorithms for a novel problem are basic skills in Computer Science. Animation is a useful aid in both these areas. We show how to animate algorithms with Microsoft Excel using data visualization and macro programming features of Excel. The user writes an algorithm using the Visual Basic programming language of Excel and defines charts visualizing dynamically the data structures of the algorithm. This approach is suitable especially for small-scale animation, e.g. for course assignments in Computer Science. 1 Introduction Teaching algorithms is a problem in Computer Science education: most students regard them as abstract black boxes with an irrelevant content. For example, the idea behind a certain sorting method is not interesting for a student. However, usability and efficiency of an algorithm for a particular problem are important aspects to learn. To inspire students to get interested in algorithms is challenging. The idea of..
    corecore